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ABSTRACT 

Fingerprint recognition has drawn a lot of attention during 
last decades. Different features and algorithms have been 
used for fingerprint recognition in the past. In this paper, 
a powerful image representation called scattering trans¬ 
form/network, is used for recognition. Scattering network 
is a convolutional network where its architecture and filters 
are predefined wavelet transforms. The first layer of scatter¬ 
ing representation is similar to sift descriptors and the higher 
layers capture higher frequency content of the signal. Af¬ 
ter extraction of scattering features, their dimensionality is 
reduced by applying principal component analysis (PCA). 
At the end, multi-class SVM is used to perform template 
matching for the recognition task. The proposed scheme is 
tested on a well-known fingerprint database and has shown 
promising results with the best accuracy rate of 98%. 

1. INTRODUCTION 

To make an application more secure and less accessible to 
undesired people, we need to be able to distinguish a person 
from the others. There are various ways to identify a person 
such as keys, passwords and cards. However, biometrics are 
the most secure options so far. They are virtually impossible 
to imitate by any other than the desired person himself. They 
can be divided into two categories; behavioral and physiolog¬ 
ical features. Behavioral features are those actions that a per¬ 
son can uniquely create or express, such as signatures, walk¬ 
ing rhythm, and the physiological features are those charac¬ 
teristics that a person possesses, such as fingerprints and iris 
pattern. Many works revolved around recognition and catego¬ 
rization of such data including, but not limited to, fingerprints, 
faces, palmprints and iris patterns m-ia. 

Fingerprint is perhaps one of the most popular bio¬ 
metrics. It has been used in various applications such as 
forensics, transaction authentication, etc m. Many of the 
algorithms proposed for fingerprint recognition are minutiae- 
based matching. The major minutiae features of fingerprint 
ridges are ridge ending, bifurcation, and short ridge. In many 
of these algorithms, minutiae are extracted from the test and 
input fingerprint images, and the number of cotTesponding 
minutiae pairings between these two images is used to verify 
the test fingerprint image. In the case of low quality fin¬ 
gerprint images, new foreground segmentation approaches 


can be used to extract the minutiae from fingerprints with an 
enhanced quality 0. There are also a lot of image representa¬ 
tions and feature-based algorithms for fingerprint recognition. 
In Q, Park proposed a fingerprint recognition system based 
on SIFT features. They extract SIFT feature points in scale 
space and perform matching based on the texture information 
around the feature points using the SIFT operator. Among 
more recent works, in 0, Cappelli proposed a new rep¬ 
resentation based on 3D data structure built from minutiae 
distances and angles called Minutia Cylinder-Code (MCC). 
In 13 , Zhao proposed to use pore matching approach toward 
fingerprint recognition. In lITOll . Zhao proposed an adaptive 
pore modeling for fingerprint recognition. 

Many of the biometric recognition systems involve a lot 
of pre-processing steps which are specifically designed for 
that kind of data and the final performance largely depends on 
the goodness of those steps. Most of them use a single layer 
representation of the image which may not be able to extract 
very discriminative set of features, and some of them may 
work well on some of the datasets but not on the other ones. 
Therefore, there have been a lot of efforts to design some su¬ 
pervised or unsupervised feature representation which works 
pretty well over various datasets and problems. These image 
representations should have invariance with respect to intra¬ 
class variation in the data. Scattering transform/network is 
one of such representations. Scattering network is a convolu¬ 
tional network in which the filters and architectures are pre¬ 
defined wavelet filters HD. It can be designed such that it 
is invariant to a family of transformation and small deforma¬ 
tions M- Due to tremendous success of deep scattering net¬ 
works to achieve state-of-the-art results in several image and 
audio classification benchmarks it is interesting to know 
how this representation works for biometric recognition. In 
m, the scattering transform is used for iris recognition and 
achieved very high accuracy rate. It can also be for extraction 
of features from MRI and other medical images ca. Here 
scattering transformation is used for fingerprint recognition. 
One advantage of deep scattering network is that all the archi¬ 
tecture is known in advance and it does not require any learn¬ 
ing of the weights and one could get very rich set of features 
by going up to two levels in this network. Therefore the pro¬ 
posed algorithm is very fast and can be implemented in elec¬ 
tronic devices in conjunction with energy-efficient algorithms 
ii6i,iini. After the scattering features are extracted, their di- 


mensionality is reduced using PCA ifTSl . At the end, multi¬ 
class SVM is used to perform classification using PCA fea¬ 
tures. This algorithm is tested on the well-known PolyU fin¬ 
gerprint database 1^ and achieved very high accuracy rate. 
Four sample hngerprint images of this database are shown in 
Figure 1. 



Fig. 1. Four different fingerprint images 

The rest of this paper is organized as follows. Section 2 
describes the features which are used in this work. The de¬ 
tails of scattering transformation is provided in Section 2.1 
and the PCA algorithm is explained in Section 2.2. Section 
3 contains the explanation of the classification scheme. The 
results of our experiments and comparisons with other works 
are presented in Section 4 and the paper is concluded in Sec¬ 
tion 5. 

2. FEATURES 

Images of the same object could have variablity due to trans¬ 
lation, scale, rotation, illumination changes. These changes in 
the images of a single object class are called intra-class vari¬ 
ations which make object recognition very difficult in some 
scenarios. Therefore it is very important to design some im¬ 
age representations which are invariant to some of these intra¬ 
class variations. Various image descriptors have been pro¬ 
posed during past 20 years. SIFT and HOG are two popular 
hand-crafted image descriptors which achieved very good re¬ 
sults on several object recognition tasks. Sparse representa¬ 
tion has also been used for extracting features in image classi¬ 
fication task m-iEii. But some of these traditional descrip¬ 
tors are not very successful for some of the more challenging 
datasets with many object classes and large intra-class varia¬ 
tions. In the more recent works, deep neural network and also 
dictionary learning approaches have achieved state-of-the-art 
results on various datasets, most notably Alex-net which 
is trained on ImageNet competition. In deep learning frame¬ 
work, the images are fed as the input the multi-layer neural 
network and the network itself hgures out what is the best 
way to combine the pixels for maximizing the accuracy. In 
the dictionary learning approach, different algorithms such as 
K-SVD or K-LDA are used to learn a set of features which 
are suitable for a given training set ll23l . Il24l . In a recent 
work, a wavelet-based representation is proposed by Mallat 
ca, which is similar to deep convolutional network where 
instead of learning the filters and representation, it uses pre- 
dehned wavelets ||25]| . These wavelets can be adopted such 
that they achieve some desired geometric invariance such as 


translation, rotation and scale invariance ifTTl . The details of 
scattering transformation are described in the following sec¬ 
tion. 

2.1. Scattering Features 

The scattering operator is a deep convolutional network which 
uses wavelet transform as its filter and can be designed such 
it is invariant to group of transformations such as transla¬ 
tion, rotation, etc CD. The scattering transform computes 
local image descriptors with a cascade of three operations: 
wavelet decompositions, complex modulus and a local aver¬ 
aging. The scattering transform provides a multi-layer repre¬ 
sentations for a signal. As discussed in ifTSll . some other im¬ 
age descriptors such as SIFT can be obtained by averaging the 
amplitude of wavelet coefficients, calculated using directional 
wavelets. This averaging provides translation-invariance to 
some extent, but it also reduces the high-frequency informa¬ 
tion. Scattering transform is designed such that it recovers the 
high-frequency information lost by this averaging. It can be 
shown that the coefficients in the hrst layer of the scattering 
transform are similar to SIFT descriptors and the coefficients 
in the higher layers contain higher-frequency information of 
the image. 

We can get different versions of scattering transform 
by modifying it such that it is invariant to a new family of 
transformations. In this work, translation invariant scattering 
transform is used and a brief description of that is provided 
here. 

Suppose we have a signal /(x). The first scattering co¬ 
efficient is the average of the signal and can be obtained by 
convolving the signal with an averaging filter (j)j as f * cjjj. 
The scattering coefficients of the hrst layer can be obtained 
by applying wavelet transforms at different scales and orien¬ 
tations, removing the complex phase and taking their average 
by (j}j as shown below: 

\f 

where ji and Ai denote different scales and orientations. Tak¬ 
ing the magnitude of the wavelet coefficients can be thought 
of the non-linear pooling functions used in convolutional neu¬ 
ral networks. Note that by removing the complex phase of 
wavelet we can make these coefficients insensitive to local 
translation. 

Now to recover the high-frequency contents of the sig¬ 
nal, which are eliminated from the wavelet coefficients of hrst 
layer by averaging, we can convolve the \f * V'ii.Ail by an¬ 
other set of wavelet at scale j 2 < J, taking the absolute value 
of wavelet and taking the average: 

11/ * V'i2.A2l *4>J 

It can be shown that \f * V'ii.Ail * '4’j2M negligible for 
scales where 2^ < 2^^. Therefore we only need to calculate 
the coefficients for ji > j 2 . 


The convolution with (pj at the second layer removes high 
frequencies and results in locally translation-invariant second- 
order coefficients. This high-frequency information can be 
restored again by finer scale wavelet coefficients in the next 
layers. We can continue this procedure to obtain the coeffi¬ 
cients of the A:-th layer of scattering network as; 

Sk,j{fix))) = II/* V'ji.Ail * 

jk<---<j 2 <ji<J, (Ai....,Afe)6r* 

It can be shown that the scattering vector of the fc-th layer 
has a size of where p denotes the number of different 

orientations and J denotes the number of scales. A scattering 
vector is formed as the concatenation of the coefficients of all 
layers up to m which has a size of J2T=oP'^{k)- many 
signal processing applications, a scattering network with two 
or three layers will be enough. At the end, we can extract 
the mean and variance from each scattering transform image 
to form the scattering feature vector. One can also extract 
further information from each image to form the scattering 
feature vector. 

The transformed images of the first and second layers of 
scattering transform for a sample fingerprint image are shown 
in Figures 2 and 3. These images are derived by applying 
bank of filters of 5 different scales and 6 orientations. 
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Fig. 2. The images from the first layer of scattering transform 



Fig. 3. The images from the second layer of scattering trans¬ 
form 


2.2. Principal Component Analysis 

In a lot of applications, one need to reduce the dimensionality 
of the data to make the algorithm faster and more efficient. 
Principal component analysis (PCA) is a powerful algorithm 
used for dimensionality reduction IfTSl . Given a set of cor¬ 
related variables, PCA transforms them into another domain 
such the transformed variables are linearly uncorrelated. This 
set of linearly uncorrelated variables are called principal com¬ 
ponents. PCA is usually defined in a way that the first prin¬ 
cipal component has the largest possible variance, the second 
one has the second largest variance and so on. Therefore after 
applying PCA, we could only keep a subset of principal com¬ 
ponents with the largest variance to reduce the dimensionality. 
PCA has a lot of applications in computer vision and neuro¬ 
science. Eigenface is one representative application of PCA 
in computer vision, where PCA is used for face recognition. 

Let us assume we have a dataset of M fingerprint images 
and {fi, f 2 , ■■■, /m} denote their features where fi G R'^. To 
apply PCA, all features need to be centered first by subtract¬ 
ing their mean: Zi = fi-f where f = ^ J2iLi fi- Then *e 
covariance matrix of the centered images is calculated as; 

M 

i=l 

Next the eigenvalues Xj and eigenvectors vj of the covariance 
matrix C are computed. Suppose Xj ’s are ordered based on 
their values. Then each Zi can be written as Zi = 

The dimensionality of the data can be reduced by projecting 
them on the first K{<^ d) principal vectors as: 

Zi = {vlzi,V2Zi,...,v'^Zi) = (ai,...,aif) 

By keeping k principal components, the percentage of re- 

A 

tained variance can be found as f ■ One issue is how to 

choose the value k, the number of principal components. One 
simple way to choose k would be to pick a value such that the 
above ratio is less than e, where e is usually chosen between 
95% to 99%. 

3. RECOGNITION ALGORITHM: SUPPORT 
VECTOR MACHINE 

After capturing the features of all people in the dataset, a clas¬ 
sifier should be used to find the closest match of each test sam¬ 
ple. There are various classifiers which can be used for this 
task including support vector machine (SVM) ll26l . majority 
voting algorithm and neural network. In this work multi-class 
SVM has been used which is quite popular for image classi¬ 
fication. A brief overview of SVM for binary classification 
is presented here. For further detail and extensions to multi¬ 
class settings we refer the reader to EH. Let us assume we 
want to separate the set of training data {xi,yi), {x 2 , y^), ■■■, 







{xnjUn) into two classes where Xi G is the feature vec¬ 
tor and Hi G { — 1,-1-1} is the class label. If we assume two 
classes are linearly separable with a hyperplane w.x -1-5 = 0 
with no other prior knowledge about the data, then the opti¬ 
mal hyperplane is the one with the maximum margin. One 
can show that the maximum margin hyperplane can be found 
by the following optimization problem: 

minimize l\\w\\^ 

w,b z n) 

subject to yi{w.Xi -f 5) > 1, i = 1,... ,n. 

Since this problem is convex, we can solve it by looking at the 
dual problem and introducing Lagrange multipliers which 
results in the following classifier: 

71 

f(x) = sign(^ a^yiW.x + h) (2) 

ai and 5 are calculated by the S VM learning algorithm. Inter¬ 
estingly, after solving the dual optimization problem, most of 
the tti’s are zero; those datapoints Xi which have nonzero 
are called support-vectors. There is also a soft-margin version 
of SVM which allows for mislabeled examples. If there exists 
no hyperplane that can split the ”-l” and ”h- 1” examples, the 
soft-margin method will choose a hyperplane that splits the 
examples as cleanly as possible, while still maximizing the 
distance to the nearest cleanly split examples Il26l . It intro¬ 
duces some penalty term in the primal optimization problem 
with misclassification penalty of C times the degree of mis- 
classification. 

To derive the nonlinear classifier, one can map the data 
from input space into a higher-dimensional feature space T-L 
as: X -G (f>{x), so that the classes are linearly separable in the 
feature space ||28]. If we assume there exists a kernel function 
where k{x, y) = (p{x).(j>{y), then we can use the kernel trick 
to construct nonlinear SVM by replacing the inner product 
x.y with k{x, y) which results in the following classifier: 

n 

fn{x) = signC^ a^ytK^x, Xi) -f 5) (3) 

i=l 

To derive multi-class SVM for a set of data with M 
classes, we can train M binary classifiers which can discrim¬ 
inate each class against all other classes, and to choose the 
class which classifies the test sample with greatest margin 
(one-vs-all). In another approach, we can train a set of (^) 
binary classifiers which any of them separates one class from 
another one and to choose the class that is selected by the 
most classifiers. There are also some other approaches for 
multi-class SVM. 

4. EXPERIMENTAL RESULTS AND ANALYSIS 

A detailed description of experimental results is presented in 
this section. First, let us describe the parameter values of our 


algorithm. For each image, scattering transform is applied 
up to two levels with a set of filter banks with 5 scales and 
6 orientations, resulting in 391 transformed images. From 
each image the mean and variance are calculated and used as 
features, resulting in 782 scattering features. For scattering 
transformation, we used the software implemented by Mal- 
lat’s group ll30l . Then PCA is applied to all features and the 
first 200 PCA features are used for recognition. Multi-class 
SVM is used for the template matching. For SVM, we have 
used LIBSVM library jMl, and linear kernel is used with the 
penalty cost C = 1. 

We have tested our algorithm on the PolyU fingerprint 
database which is provided by Hong Kong Polytechnic Uni¬ 
versity. It contains 1480 images of 148 fingers. The images 
of 25 people are used as a validation set for parameter tuning 
of our algorithm. Then from the remaining fingers, half of the 
images are used for training and the other half for testing. To 
make feature extraction faster, we have resized all images to 
80 X 60. 

Figure 4 shows the recognition rate of the proposed ap¬ 
proach for different number of PCA features. Interestingly, 
even by using few PCA features, we are able to get a very 
high accuracy rate. As it can be seen, using 200 PCA fea¬ 
tures results in an accuracy rate around 98%, which will not 
increase much by using more PCA features. 



Number of PCA features 

Fig. 4. Recognition accuracy as a function of number of PCA 
features 

The equal error rate (EER) of the proposed algorithm is 
also calculated on this dataset. Equal error rate is a rate at 
which both acceptance and rejection errors are equal. To find 
EER we have used the minimum distance classifier. Eigure 5 
shows the false acceptance rate and false rejection rate versus 
the distance threshold. As we can see using the proposed the 
EER= 8% is achieved. 

Table 1 shows a comparison between the EER of the pro¬ 
posed scheme and those of some other previous works on this 
dataset. The proposed approach achieved a smaller EER com¬ 
pared to the other approaches, but there is still a big room for 
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Fig. 5. FAR and FRR versus the distance threshold 


improvement of EER on this dataset. That shows that the 
translation-invariant scattering network is more suitable for 
fingerprint identification than verification. The EER can be 
further improved by using rotation-translation invariant scat¬ 
tering features and also using more powerful classifiers than 
minimum distance classifier. 


Table 1. A comparison between EER of the proposed scheme 
and previous approaches 


Method 

equal error rate 

MICPP 0 

30.45% 

Direct Pore Matching jO] 

20.49% 

Using Only Minutiae fOl 

17.68% 

Adaptive pore modeling IIIOI 

11.51% 

The proposed scheme 

8.1% 


The experiments are performed using MATLAB 2012 on 
a laptop with Core i5 CPU running at 2.6GHz. It takes around 
97 milliseconds for each image to perform template matching 
using multi-class SVM. 


5. CONCLUSION 

This paper proposed to use a translation-invariant scattering 
network for fingerprint recognition. Scattering features are lo¬ 
cally invariant and carry a lot of high-frequency information 
which are lost in other descriptors such as SIET. The high- 
frequency information provides great discriminating power 
for fingerprint recognition. Then PCA is applied on features 
to reduce dimensionality. At the end, multi-class SVM is used 
to perform template matching. This shows the potential of 
scattering network for biometric recognition systems. In the 
future, we will investigate to apply the proposed set of fea¬ 
tures to other biometrics. 


Acknowledgments 

The authors would like to thank Stephane Mallat’s research 
group at ENS for providing the software implementation of 
scattering transform. We would also like to thank the CSIE 
group at NTU for providing LIBSVM software. We would 
also like to thank biometric research group at PolyU Hong 
Kong for providing the fingerprint dataset. 

6. REFERENCES 

[1] D. Maltoni, D. Main, AK. Jain and S. Prabhakar, “Handbook of 
fingerprint recognition,” Springer Science and Business Media, 
2009. 

[2] W. Zhao, R. Chellappa, PJ. Phillips and A. Rosenfeld, “Face 
recognition: A literature survey,” ACM Computing Surveys 
(CSUR) 35, no. 4: 399-458, 2003. 

[3] S. Minaee, AA. Abdolrashidi, “Multispectral palmprint recog¬ 
nition using textural features,” IEEE Signal Processing in 
Medicine and Biology Symposium, 2014. 

[4] S. Minaee, AA. Abdolrashidi, “Highly Accurate Multispectral 
Palmprint Recognition Using Statistical and Wavelet Features,” 
IEEE Signal Processing Workshop, 2015. 

[5] KW. Bowyer, KP. Hollingsworth and PJ. Flynn, “A survey of 
iris biomehics research: 20082010,” Handbook of iris recogni¬ 
tion. Springer London, 15-54, 2013. 

[6] S Minaee and Y Wang. “Screen Content Image Segmentation 
Using Least Absolute Deviation Fitting,” ICIP, IEEE, 2015. 

[7] U. Park, S. Pankanti, AK. Jain, “Fingerprint verification using 
SIFT features,” SPIE Defense and Security Symposium, 2008. 

[8] R. Cappelli, M. Ferrara, D. Maltoni, “Minutia cylinder-code: 
A new representation and matching technique for fingerprint 
recognition,” IEEE Transactions on Pattern Analysis and Ma¬ 
chine Intelligence, 2010. 

[9] Q. Zhao, L. Zhang, D. Zhang, N. Luo, “Direct pore matching for 
fingerprint recognition,” In Advances in Biometrics, pp. 597- 
606. Springer Berlin Heidelberg, 2009. 

[10] Q. Zhao, D. Zhang, L. Zhang, N. Luo, “Adaptive fingerprint 
pore modeling and extraction,” Pattern Recognition 43, no. 8: 
2833-2844, 2010. 

[11] L. Sifre and S. Mallat, “Rotation, scaling and deformation in¬ 
variant scattering for texture discrimination,” IEEE Conference 
on Computer Vision and Pattern Recognition, 2013. 

[12] S. Mallat, “Group invariant scattering,” Communications on 
Pure and Applied Mathematics, 2012. 

[13] J. Brana and S. Mallat, “Classification with scattering opera¬ 
tors,” IEEE Conference on Computer Vision and Pattern Recog¬ 
nition, pp.1561-1566, 2011. 















[14] S Minaee, AA Abdolrashidi and Y Wang, “Iris Recognition 
Using Scattering Transform and Textural Features,” IEEE Sig¬ 
nal Processing Workshop, 2015. 

[15] S Minaee, Y Wang and YW Lui, “Prediction of longterm out¬ 
come of neuropsychological tests of MTBI patients using imag¬ 
ing features,” In Signal Processing in Medicine and Biology 
Symposium (SPMB), IEEE, 2013. 

[16] M. Hosseini, A. Fedorova, J. Peters and S. Shirmohammadi, 
Energy-aware adaptations in mobile 3D graphics, ACM Multi- 
media: 1017-1020,2012. 

[17] M. Hosseini, J. Peters, S. Shirmohammadi, “Energy-budget- 
compliant adaptive 3D texture streaming in mobile games”. 
Proceedings of the 4th ACM Multimedia Systems Conference, 
2013. 

[18] H. Abdi and LJ. Williams, “Principal component analysis,” 
Wiley Interdisciplinary Reviews: Computational Statistics 2.4: 
433-459, 2010. 

[19] U Srinivas, H Mousavi, C Jeon, V Monga, A. Hattel and B. Ja- 
yarao, “SHIRC: A simultaneous sparsity model for histopatho- 
logical image representation and classihcation”, ISBI, IEEE, 
2013. 

[20] M. Rahmani and G. Atia, “Randomized Subspace Learning 
Approach for High Dimensional Low Rank plus Sparse Matrix 
Decomposition”, 49th Asilomar Conference on Signals, Sys¬ 
tems, and Computers, Nov 2015. 

[21] HS Mousavi, U Srinivas, V Monga, Y. Suo, M. Dao and TD. 
Tran, “Multi-task image classification via collaborative, hierar¬ 
chical spike-and-slab priors”. International Conference on Im¬ 
age Processing, IEEE, 2014. 

[22] A. Krizhevsky, I. Sutskever, GE. Hinton, “Imagenet classifi¬ 
cation with deep convolutional neural networks,” Advances in 
neural information processing systems, 2012. 

[23] Q. Zhang and B. Li, “Discriminative K-SVD for dictionary 
learning in face recognition,” IEEE Conference on Computer 
Vision and Pattern Recognition, 2010. 

[24] J. Golmohammady, M. Joneidi, M. Sadeghi, M. Babaie-Zadeh 
and C. Jutten, “K-LDA: An algorithm for learning jointly over¬ 
complete and discriminative dictionaries,” Proceedings of the 
European in Signal Processing Conference, IEEE, 2014. 

[25] J. Bruna and S. Mallat, “Invariant scattering convolution net¬ 
works,” IEEE Transactions on Pattern Analysis and Machine 
Intelligence, 35.8: 1872-1886, 2013. 

[26] C. Cortes and V. Vapnik, “Support-vector networks,” Machine 
learning 20.3: 273-297, 1995. 

[27] J. Weston, C. Watkins, “Multi-class support vector machines,” 
Technical Report CSD-TR-98-04, Department of Computer 
Science, Royal Holloway, University of London, May, 1998. 

[28] B. Schlkopf, AJ. Smola, “Learning with kernels: Support vec¬ 
tor machines, regularization, optimization, and beyond,” MIT 
press, 2002. 


[29] http://www.comp.polyu.edu.hk/biometrics/HRF/HRF.htm 

[30] http://www.di.ens.fr/data/software/scatnet/ 

[31] CC. Chang, CJ. Lin, “LIBSVM: A library for support vector 
machines,” ACM Transactions on Intelligent Systems and Tech¬ 
nology (TIST) 2.3: 27, 2011. 


